Data Pre-processing

Data Cleaning

In this dataset, the data cleaning process will consists of:

  1. Changing the variables to appropriate Data types
  2. Removing Null Values

Making a new dataframe containing the Amount, Profit and Quantity of the different orders. Then joining it with the Order datasets by taking Order ID as the Primary Key.


Sales Trend Analysis

Trend analysis is to find patterns in data, such as ups & downs. A “trend” is an upwards or downwards shift in a data set over time. In retail, this analysis of past trends in sales or revenue; allows to predict the future market. This analysis useful for budgeting and forecasting. Total sales of any business on a trend line may obtain some significant information.



Customer Demographic Analysis

Customer demographics are categories of consumer populations that are relevant to a business' purposes, such as marketing and product design. The term also refers to the study of such categories in a business context.

The state with the highest quantity sold Madhya Pradesh, followed by Maharastra and Rajasthan. There is a biggest gap between the quantity sold in Maharastra an Rajasthan with a difference of 58 units. While in case of Cities, it is Indore and Mumbai by a very wide margin. Chennai, Allahabad and Amritsar have the lowest quantity sold with less than 10 units sold.


Sales Target

A sales target is a goal set for a salesperson or sales department measured in revenue or units sold for a specific time.

The above bar graph illustrates the Target and the Actual Amount of Profits per Quantity. None of the category surpass or even meet the target. The most disappointing Category is Furniture, which managed to have a profit of Rs. 2298 with the target of Rs. 11.8K


Customer Segmentation via Cluster Analysis

Cluster analysis uses mathematical models to discover groups of similar customers based on the smallest variations among customers within each group.

Cluster Analysis

Cluster analysis is the use of a mathematical model to discover groups of similar customers based on finding the smallest variations among customers within each group. The goal of cluster analysis in marketing is to accurately segment customers in order to achieve more effective customer marketing via personalization. A common cluster analysis method is a mathematical algorithm known as k-means cluster analysis, sometimes referred to as scientific segmentation. The clusters that result assist in better customer modeling and predictive analytics, and are also are used to target customers with offers and incentives personalized to their wants, needs and preferences.

The k-value of 3 is the best hyperparameter for our model because the next k-value tend to have a linear trend.

Segment 1: Medium Buyers
Segment 2: Loyal Buyers
Segment 3: Occational Buyers